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Estimation of the Number of Sources in Unbalanced 
Arrays via Information Theoretic Criteria 

Eran Fishier^ and H. Vincent Poor^ 



■ - - Abstract — Estimating the number of sources impinging on an 
array of sensors is a well known and well investigated problem. 
A common approach for solving this problem is to use an 
information theoretic criterion, such as Minimum Description 
Length (MDL) or the Akaike Information Criterion (AIC). The 
^ MDL estimator is known to be a consistent estimator, robust 
against deviations from the Gaussian assumption, and non-robust 

h— ) against deviations from the point source and/or temporally or 
^ spatially white additive noise assumptions. Over the years several 
alternative estimation algorithms have been proposed and tested. 
Usually, these algorithms are shown, using computer simulations, 

,_,to have improved performance over the MDL estimator, and to 

r I be robust against deviations from the assumed spatial model. 

Nevertheless, these robust algorithms have high computational 
^ complexity, requiring several multi-dimensional searches. 
^ In this paper, motivated by real life problems, a systematic 

I A pproach toward the problem of robust estimation of the number 
of sources using information theoretic criteria is taken. An 
MDL type estimator that is robust against deviation from 
assumption of equal noise level across the array is studied. The 
consistency of this estimator, even when deviations from the equal 
noise level assumption occur, is proven. A novel low-complexity 
implementation method avoiding the need for multi-dimensional 
searches is presented as well, making this estimator a favorable 
choice for practical applications. 
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I. Introduction 
'^A. Motivation 

^> The problem of estimating tlie number of sources impinging 
on a passive array of sensors has received a considerable 

5^ amount of attention during the last two decades. The first 
to address this problem were Wax and Kailath, [1]. In their 
seminal work [1] it is assumed that the additive noise process 
is a spatially and temporally white Gaussian random process. 
Given this assumption the number of sources can be deduced 
from the multiplicity of the received signal correlation matrix's 
smallest eigenvalue [2], [3]. In order to avoid the use of 
subjective thresholds required by multiple hypothesis testing 
detectors [4], Wax and Kailath suggested the use of the 
Minimum Description Length (MDL) criterion for estimating 
the number of sources. The MDL estimator can be interpreted 
as a test for determining the multiplicity of the smallest 
eigenvalue [3]. 
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Following [1], many other papers have addressed this prob- 
lem (see, among others [5], [6], [7], [8], [9], [10]). These 
papers can be divided into two major groups: the first is 
concerned with performance analysis of the MDL estimator 
[11], [2], [12], [13], while the second is concerned with 
improvements on the MDL estimator 

Papers detailing improvements of the MDL estimator can 
be found quite extensively: [5], [14], [6], [15], [16], [9] is 
only a partial list of such works. In many of these works 
the MDL approach is taken, and by exploiting some type of 
prior knowledge, performance improvement is achieved [17], 
[6], [10]. One of the assumptions usually made is that the 
additive noise process is a spatially white process, and the 
robustness of the proposed methods against deviations from 
this assumption is usually assessed via computer simulations 
[9]. In general it can be observed that methods which use 
some kind of prior information are robust, while methods 
which are based on the multiplicity of the smallest eigenvalue 
are non-robust. The reason for these latter estimators' lack of 
robustness is that, when a deviation from the assumed model 
occurs, the multiplicity of the smallest eigenvalue equals one 
[1]. Thus, one can not infer the number of sources from the 
multiplicity of the received signal correlation matrix's smallest 
eigenvalue. On the other hand, methods that are based on 
some prior knowledge, e.g., the array steering vectors, usually 
have high computational complexity, requiring several multi- 
dimensional numerical searches [18]. Moreover, these methods 
are not necessarily consistent when some deviations from the 
assumed model occur, although they exhibit good robustness 
properties in simulations. 

Efficient and robust estimation of the number of sources is 
very important in bio-medical applications (see, for example 
[19], and references therein). For example, in one such ap- 
plication it is of interest to estimate the number of neurons 
reacting to a short stimulus. This is done by placing a very 
large array of sensors over a patient's head, and recording 
the brain activity as received by these sensors. In these bio- 
medical problems no a priori knowledge (e.g., knowledge of 
steering vectors) exists. Moreover, since different sensors are 
at slightly different distances from the patient's skin, the noise 
levels at the outputs of the sensors vary considerably. Thus bio- 
medical applications are an example of one important class of 
problem in which the additive noise is not necessarily spatially 
homogeneous. 

Although robust estimators for the number of sources exist, 
these estimators require some a priori knowledge which is 
often not avilable, and their computational complexity is large, 
as noted above. Thus, computationally efficient and robust 
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C. Information Theoretic Criteria and MDL Estimators 

An Information Theoretic Criterion (ITC) is an estimation 
criterion for choosing between several competing parametric 
models [3]. Given a parameterized family of probability den- 
sities, /x (X|0g) , 6q 6 @q for X = [xi, . . . ,xjv] and for 
various q, an ITC estimator selects q such that [3]: 



estimators for the number of sources are of considerable 
interest. These estimators should not require prior knowledge 
and should be consistent even when deviations from the 
assumed model occur. Such estimators for specific types of 
deviations from the assumed model are developed in this paper. 
In particular, we consider the situation in which the sensor 
noise levels are spatially inhomogeneous. It will be shown 
that while traditional methods for estimating the number of 
sources tend to over-estimate the number of sources under 
these circumstances, our proposed estimator does not have this 
tendency. 



B. Problem Formulation 

Consider an array of p sensors and denote by x(<) the 
received, p-dimensional, signal vector at time instant t. Denote 
hy q < p the number of signals impinging on the array. A 
common model for the received signal vector is [18], [11]: 

x(t) = As(t) + n(i) , i = l,2,...,7V (1) 

where A = [a('0i), a('02), • • • , a('0g)] is a p x q matrix 
composed of q p-dimensional vectors, and a(i/j) lies on the 
array manifold {A = a.{xp)\ip G NP}, where SI/ denotes 
a set of parameters describing the array response. a{ip) is 
called the array response vector or the steering vector and 
A is referred to as the steering matrix, and ipi is a vec- 
tor of unknown parameters associated with the ith source. 
s(t) = •■■ Sq{t)]'^ is a white complex, stationary 

Gaussian random processes, with zero means and positive 
definite correlation matrix, Rg; n(t) is a temporally white 
complex Gaussian vector random process, independent of 
the signals, with zero mean and correlation matrix given by 
diag {[(Tf,a^, a^]), where diag {[erf, cr|, . . . , cr'^]) denotes 
a diagonal matrix with the vector [ctJ, ctI, . . . , Cp] on its diag- 
onal. This correlation matrix represents the scenario in which 
each sensor potentially faces a different noise level. Define 

= p Tn=l ^« = ~ ^ = ■ ■ , Wp]. 

The additive noise correlation matrix can be described with the 
aid of and w as follows E |n(i)n-^(t)} = cr^I + diag (w). 
This alternate representation simplifies some of the proofs and 
derivations in the sequel. Note, that the vector w represents a 
deviation from the assumption that the noise level is constant 
across the array. Finally, all the elements of the steering matrix, 
A, are assumed to be unknown [1], with the only restriction 
being that A is of full rank. In the sequel the Gaussian 
assumption will be eased. 

We denote by 6q the set of unknown parameters assuming 
q sources, that is Oq = [Rg.g, A^, cr^ g, w^], where Rs,g is the 
transmitted signals' correlation matrix assuming q sources; Aq 
is the steering matrix assuming q sources; cr^ ^ is the white 
noise level; and Wg is the vector containing the parameters 
representing the deviations from the spatially white noise 
assumption. The parameter space of the unknown parameters 
given q sources is denoted by 0g. The problem is to estimate 
q based on N independent snapshots of the array output, 
xi = x(ti), . . . , xjv = x(tjv) [1]- 



giTC = argminlTC(g) = argmin |-L y0qj + penalty(g)| (2) 



where L{6q) = log/x(X|0g) is the log-likelihood of the 
measurements, 6q = argmaxg g© /x (X|0g) is the max- 
imum likelihood (ML) estimate of the unknown parameters 
given the qth family of distributions, and penalty(g) is some 
general penalty function associated with the particular ITC 
used. The MDL and AIC estimators are given by penalty(g) = 
O.5|0g| log (A^) and penalty(p) — \&q\ respectively, where 
|0g| is the number of free parameters in &q [20], [21], [1]. It 
is well known that, asymptotically and under certain regularity 
conditions, the MDL estimator minimizes the description 
length (measured in bits) of both the measurements, X, and 
the model, 9q [22], while the AIC criterion minimizes the 
Kullback-Liebler divergence between the various models and 
the true one. In the rest of the paper we will consider only the 
MDL criterion, although other penalty functions can also be 
treated similarly. 

Although in many problems associated with array process- 
ing, e.g., direction of arrival (DOA) estimation, one has some 
prior knowledge about the array structure, when estimating the 
number of sources this prior knowledge is usually ignored [1], 
[18]. The reason for this is that by ignoring the array structure 
and assuming Gaussian signals and noise, and w = 0, the 
resulting MDL estimator Q, termed here the Gaussian-MDL 
(GMDL) estimator [11], has a simple closed form expression 
given by [1] 



g I GMDL = arg mm 

g=0,...,p— 1 



-iVlog 



nr=5+i 



p-q 



+ i(g(2p -<?) + !) log iV 



(3) 



where h > h > ■ ■ ■ > Ip are the eigenvalues of the empirical 
received signal's correlation matrix, R = jjj^^i^f- 
well known that when w = the GMDL estimator is a 
consistent estimator of the number of sources, while when 
w ^ 0, the GMDL estimator, Q, is not consistent and in fact, 
as the number of snapshots approaches infinity, the probability 
of error incurred by the GMDL estimator approaches one [11]. 

Denote by TZq the set of all positive definite, rank q, 
Hermitian, p x p matrices, and by W the set of all zero mean 
p-length vectors. Given the assumptions made in the problem 
formulation, the MDL estimator for estimating the number 
of sources, denoted hereafter as the Robust-MDL (RMDL) 
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estimator, is given by. 



gRMDL = arg mm 

g=0,...,p— 1 



{ TV log I AgRs,gAf + + diag (w,) 
+Tr ( f A,R,,,Af + &lA + diag (W,; 



R 



+ -{q{2p - q)+p) log N 



(4) 



where A,, Rg.^, ct^ Wg are the ML estimates of the un- 
known parameters assuming q sources, that is 



Ag,Rs^q,al^g,Wg = arg 



max 



[-A^log I AgRs,gAf + al^l + diag (w,) 



-Tr{(A,Rs,,Af +a,2,I + diag 



(w,))"'r} 



(5) 



Note that since A^Rg.qA^ G TZq, by using eigen- 
decomposition we can write AgRg ^A^ — -^iVivf^, 
where {v;} is an orthonormal set of vectors. Hence, the vector 
of unknown parameters assuming q sources is also given by 



Og = [Ai, . . . , A,,vf , . . . ,v!',w^,cr^] 



(6) 



D. Organization of the Paper 

The rest of this paper is organized as follows: In Section II 
we discuss the indentifiabilty of the estimation problem and 
we prove the consistency of the RMDL estimator In Section 
III we describe a low-complexity algorithm for approximating 
the RMDL estimator, (0}, and we discuss the properties of 
this algorithm. In Section IV we present empirical results. In 
Section V some concluding remarks are provided. 

ii. identifiability and consistency of the rmdl 
Estimator 

A. Identifiability 

Consider a parameterized family of probability density 
functions (pdf's) /x(x|0), 9 E &. This family of den- 
sities is said to be identifiable if for every 6 ^ 6' , the 
Kullback-Liebler divergence between /x (x|0) and /x (x|0') 
is greater than zero, that is £) (/x (x|0) | |/x (x|0')) > 0, 
where _D(/(x)||5(x)) = / /log^ is the Kullback-Leibler 
divergence between /(x) and g{'x.) [23]. This condition insures 
that there is a one-to-one relationship between the parameter 
space and the statistical properties of the measurements. 

The problem discussed in Section II-BI is a model order 
selection problem [22]. This problem is unidentifiable if it 
is possible to find for some k ^ I two points in the parameter 
space. Ok e &k and 6», e 0; such that /(-l^fe) = / (-I©,). 
Unfortunately, we can, in fact, identify two such points leading 
to the conclusion that the estimation problem discussed in 
Section II-BI is unidentifiable. The received signal's pdf is 
fully characterized by the received signal's correlation matrix. 
Thus, in order to prove that the problem is unidentifiable, it 
suffices to find two different parameter values under which the 
corresponding received signal's correlation matrices are equal. 



Take, for example, the following received signal correlation 
matrix: diag ([11, 10.5, 9.5, 10]). This correlation matrix can 
result from a noise-only scenario with = 10.25 and w = 
[0.75, 0.25, — 0.75, — 0.25], or from a one source scenario 
where cr^ = io,w = [0,0.5,- 0.5, 0], a = [1,0,0,0]^, and 
Rg — 1. Thus we have found two scenarios, the first corre- 
sponding to a noise only scenario, and the other corresponding 
to a one source scenario, such that the distribution of the 
received signal vector is the same. Thus, this example shows 
that the estimation problem formulated is unidentifiable. 

In order to make the estimation problem identifiable, all the 
points having the same received signal pdf must be removed 
from the parameter space except one. As is the custom in 
model order selection problems, among all the points having 
the same received signal pdf, the one with the smallest number 
of sources, that is, the point with the lowest number of 
unknown parameters, is left in the parameter space, and the 
remaining ones are deleted. The main question that arises 
is whether most of the points in the parameter space are 
identifiable or not. Fortunately, the answer to this question 
is yes; that is, most of the points in the parameter space 
are identifiable. The following lemma characterizes all the 
unidentifiable points in the parameter space. 

Lemma 1: Suppose q < p. Then 9q is an unidentifiable 
point in the parameter space if and only if the matrix 
"^j'^? contains aej = [Oj_i, 1, Op_j] as its jth row 
for some j G [1, . . . ,p], where defined in (|6}. 

Proof of Lemma 1: See Appendix |l] 

The proof of Lemma provides an interesting physical 
interpretation of the unidentifiable points. In particular, it can 
be seen from the proof of the lemma that all the unidentifiable 
points are similar to the above example used to show that 
the problem is unidentifiable. That is, an unidentifiable point 
corresponds to a scenario where there are, say q sources, and 
one of them is received at only one of the sensors. Since 
this source can not be distinguished from a deviation, from 
some nominal value, of the noise level in the corresponding 
element, this scenario could be confused with a different 
scenario having one fewer source, and an increase in the noise 
level at the proper element. From a practical viewpoint, this 
type of situation is a rarity. 

B. Consistency of the RMDL Estimator 

In the previous subsection it was proved that the estimation 
problem defined in Section lLBl is unidentifiable. Nevertheless, 
it was also argued that only a small portion of the points in the 
parameter space are unidentifiable, meaning that by excluding 
these points from the parameter space the problem becomes 
identifiable. For the rest of this paper, we consider these points 
to be excluded from the parameter space. Once the estimation 
problem has been shown to be identifiable, it is possible to 
infer the number of sources from the measurements. However 
for a specific estimator, the issue of consistency must be 
considered. 

In model order selection, the common performance measure 
is the probability of error, that is Pg = P (q 7^ q) [3]. In what 
follows the RMDL estimator, (0}, is proven to be a consistent 



estimator, that is lim 
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Lemma 2: The RMDL estimator, (0}, is a consistent esti- 
mator of the number of sources. 

Proof of Lemma 2: See Appendix IIII 

Deviations from the assumption of spatial homogeneity are 
part of our general model. Thus, even if the noise levels at 
various sensor are not equal, according to Lemma|2the RMDL 
estimator, @, is still consistent. That is, the probability of 
error of the RMDL estimator still converges to zero even in 
the presence of deviations from assumption of equal noise 
levels. 

It is well known that the GMDL estimator, Q, is a non- 
robust estimator when the noise levels at the various sensor are 
not equal, i.e., the probability of error of the GMDL estimator 
approaches one as iV — > cxd. Nevertheless, it is known that 
the GMDL estimator is robust against statistical mismodeling. 
Under very weak regularity conditions, if the transmitted 
signal and/or the additive noise are non-Gaussian, then the 
probability of error of the GMDL estimator still converges to 
zero. Fortunately, it can be shown that the RMDL estimator, 
is robust against statistical mismodeling as well. Being 
robust against both statistical and spatial mismodeling is an 
advantage of the RMDL estimator over the GMDL estimator. 

We conclude this subsection by proving that the RMDL 
estimator, Q is a consistent estimator even in the presence 
of statistical mismodeling. Denote by g{-x.) the actual pdf 
of the received signal at some time instant, and by / (x|0) 
the assumed measurement pdf, i.e., the Gaussian distribution. 
Note that it is still assumed that Rx — £'{xx^} has the 
following form Rx = AgRgA^ + o-,^I + diag(w). Let 
Eg {/i(x)} — J /i(x)(7(x)dx. The following lemma establishes 
the consistency of the RMDL estimator when the sources are 
not Gaussian 

Lemma 3: Assume that Eg { ^^^^^^ ^^2^2^ | and 

^9 { ^ dofdey^^'' } ^''^ist and are finite. Then the probability of 
error of the RMDL estimator converges to zero as iV — > oo. 
Proof of Lemma 3: See Appendix IIIII 

III. A Practical Estimation Algorithm 

In the previous section the asymptotic properties of the 
RMDL estimators were considered. It was proven that the 
RMDL estimator is both a consistent and robust estimator of 
the number of sources. These two properties make the RMDL 
estimator very appealing for use in practical problems. How- 
ever the computational complexity of the RMDL estimator 
is still very high compared to that of the GMDL estimator. 
Recall that in order to implement the RMDL estimator ML 
estimates of the unknown parameters must be found for every 
possible number of sources. Since no closed-form expression 
for these ML estimates exists, multi-dimensional numerical 
searches must be used in order to find them. Even for moderate 
array sizes, e.g., p — 6, the number of unknown parameters is 
a few dozen, which makes the task of finding the ML estimates 
impractical. 

In order to overcome the computational burden of com- 
puting the ML estimates, we propose to replace the ML esti- 
mates by estimates obtained using a low-complexity estimation 
algorithm. A reasonable criterion used in array processing 



applications is to choose as an estimate the parameter vector 
that minimizes the Frobenius norm of the error matrix [24], 
[25]; that is 



e 



q.LS 



arg min ||(R - Rx(0g))||| 



e„e©, 



arg mm 



ig Tr{(R-Rx(0g))(R-Rx(09))''} 



= arg mi_n ^ ^ I [(R - Rx(0g))(R - Rx(0,))]4) 

t'qt©q .1 

J = l J = l 

and the corresponding estimate for the number of sources is 
given by. 



q = arg mm 

q=0,...,p-l 



-L (Oqxs) + q{2p - q) 



logN 



(8) 



Replacing the ML estimates with their LS counterparts 
raises two important questions. One is whether replacing the 
ML estimates with the LS estimates results in performance 
loss; and the second is whether efficient algorithms for com- 
puting the LS estimates exist. Fortunately, it can be demon- 
strated that no performance loss is incurred (asymptotically) 
by replacing the ML estimates with the LS estimates, and 
an efficient algorithm for computing the LS estimates exists. 
It was pointed out by one of the reviewers that for finite 
sample sizes since the ML estimates are replaced by the LS 
estimates, it is not guaranteed that L {Oq^LS^ < L {Oq+ixsj ■ 
This problem can be easily solved by noting that because 
the problem is a nested hypotheses problem hatOq G Qq+i- 
Therefore, if L{eq\Ls) < L{9q+i\Ls), we can use L{eq^Ls) 
instead of i(^g+i|Ls) in the MDL formula. 

Our problem is a model order selection problem, and our 
main interest is in the probability of error of the proposed 
estimator. In [11], it is demonstrated that the MDL's asymp- 
totic probability of error depends on 6q and 6*_^ G ®q-i, 
where e*q_^ = argmine^_, L'(/(x|6>5)||/(x|0g_i)). It is 
easily seen that Oq_^ is the limit of the ML estimates under 

the assumption of g — 1 sources, i.e., 6q^i\6q — > ^g-i- 
Analysis similar to that in [11] demonstrates that if a consistent 
estimator is used instead of the ML estimator in the MDL 
estimator, then the asymptotic probability of detection remains 
the same. Since the LS estimator is a consistent estimator of 
the unknown parameters, the asymptotic performance of the 
RMDE's simplified version, (|8}, is the same as the asymptotic 
performance of the RMDL estimator, (|4}. 

Similarly to the ML estimates, 0g,LS is the solution of a 
nonlinear programming problem, requiring brute-force multi- 
dimensional search. Nevertheless, based on the concept of 
serial interference cancellation (SIC) [26], in what follows a 
novel algorithm for finding 9q Ls is suggested. In this algo- 
rithm the unknown parameters are divided into two groups, 
and given the estimate of one group of unknown parameters, 
an estimate of a second group of unknown parameters is 
constructed. The estimates are constructed in such a way as 
to insure a decrease in the Frobenius norm of the error matrix 
after each iteration. The estimation process iterates between 
the two groups of unknown parameters, until the estimates 
converge to a stationary point. 
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In multiple access communications two (or more) users 
transmit information over two non-orthogonal subspaces. The 
serial interference cancellation multiuser detection algorithm 
for data detection in such situations works as follows. First, 
the unknown parameters associated with the first user are 
estimated. Next, an error signal is constructed by subtracting 
from the received signal the estimated first user's transmitted 
signal. In the next stage, the unknown parameters associated 
with the second user are estimated from the error signal. In 
the next iteration, the unknown parameters associated with 
the first user are re-estimated based on the received signal 
after subtraction of the estimated second user's transmitted 
signal. This iterative process is continued until convergence is 
reached. 

The principle behind the SIC multiuser detector can 
be used for constructing a novel low-complexity esti- 
mation algorithm for estimating the unknown parameters 
in the present situation. In what follows such a low- 
complexity estimation algorithm is described and its prop- 
erties are discussed. The unknown parameters in our es- 
timation problem are [Ai, . . . , A^, , . . . , v^, w-'", cr^J, or 
equivalently, A^Rg, g A^, cr^ q, Wg. These unknown parame- 
ters are divided into two groups. The first group contains 
[Ai, Aq, vf, v^, cr^], or equivalently A^Rg^gA^ and 
(T^ ^, while the second contains w^. The first group corre- 
sponds to the unknown parameters of the ideal point source 
plus spatially white additive noise model, while the second 
corresponds to the unknown parameters representing the de- 
viations from the ideal model. In a sense, can be regarded 
as the unknown parameters that robustify the estimator. The 
input to the algorithm is Rx = jjJ2tLi^{ti)'^{ti)^ which 
is a sufficient statistic for estimating the unknown parameters, 
assuming Gaussian sources and noise. 

Denote by tT^ ,j, A^R^^ (A^) , and the estimates of 
the unknown parameters after the jth iteration. The proposed 
algorithm is implemented as follows: In the first iteration, 
cr^^jAjRgq (-^q) estimated from Rx- The best esti- 

mates, in both the ML sense and the Least Squares (LS) sense 
(see appendix II VL are 



1 



A^Ri 



(A,; 



H 



P 

■E' 

i=q+l 

q 

E 

1=1 



(9) 



v.vf (10) 



where li > ■ ■ ■ > L and vi, 



, Vp are, respectively, the 



eigenvalues and eigenvectors of Rx. In the next step an error 
matrix, denoted by E, is constructed by subtracting from Rx 
the estimate for the estimated part of the received signal's 
correlation matrix corresponding to the ideal model, that is 
AjRg ^ (A^) + ((T,\ ^) I. Thus, the error matrix is given 
by 



E = R 



A^Ri 



H 



n.l) 



(11) 



Next, is estimated from E, and the best estimate in both 
the ML and LS sense is. 



At the jth iteration, we apply the same procedure except that 
ai „ and A:^R:^ „ (A^) are estimated from R — w^~^, while 

is estimated from R — A^R^ ^ (-^9) ^ {'^n q) ^■ 
Summarizing the above, our proposed estimation algorithm 
is given as follows: 

1) Initialize E = R. 

2) Compute li > • ■ • Zp, vi, . . . , Vj, the eigenvalues and the 
corresponding eigenvectors of E. 

3) Compute the following estimates, 



\ 



1 ^ 

- E 



(13) 



i=g+l 
q 



A,Rg,, (A,)^ - E - (^".«)') ^^^^ (14) 
= diag (r - A,Rg,, (A,)^ - (a„,,)' I()l5) 

4) Compute E = R - w^. 

5) If the estimates have stabilized, stop; otherwise return 
to step 2. 

A major question that arises is whether this algorithm is 
guaranteed to converge and, if so, whether the stationary 
point of the algorithm is optimal in some sense. Fortunately, 
the answers to these questions are yes. In Appendix IIVI it 
is proven that in each step of the algorithm, the Frobenius 
norm of the error matrix decreases, that is ||R — R(0^'^)|||. > 
|R — R(0^+^)|||,, where 0^ is the estimate of the unknown 
parameters after the nth iteration. This also proves that the 
proposed algorithm converge to a local minimum of the LS 
cost function. 

Consider our proposed iterative algorithm. The most com- 
plex operation in our algorithm is the eigenvalue decom- 
position whose complexity is 0{p^). Since the process is 
repeated p times (one for each possible number of sources), 
the complexity of our algorithm is 0{p'^) per iteration. 

Since no closed expression for the ML estimates exists, 
some numerical maximization method must be used. There- 
fore, the complexity of the ML estimator depends on the 
number of iterations and the exact numerical maximization 
method used. However, we can still demonstrate that the com- 
plexity of the ML estimator is higher than that of our proposed 
algorithm. Since efficient numerical maximization algorithms 
require the computation of the derivative of the likelihood 
function, we examine the complexity of computing this deriva- 
tive. The most complex operation in computing the derivative 



OTr{R-i(e)R} 

89, 



Tr{[R-i(0)RRx'W]^5iW|^ 



= diag (E) . 



(12) 



which has a complexity of 0{p ). This operation has to be 
repeated p times, one for each possible number of sources. 
Therefore the complexity of computing the derivative of the 
likelihood function per iteration is 0{p'^). It follows that for 
p > 3, the complexity of the ML estimator is higher by several 
orders of magnitude than our proposed iterative algorithm. 

IV. Simulations 

In this subsection simulation results with synthetic data are 
presented. We consider a uniform linear array with 10 ele- 
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ments, and assume three equal-power and independent sources 
having signal-to-noise ratio (SNR) per element of dB. 
The sources' directions of arrival (DOA's) are taken to be 
[0° 5.7° 11.4°]. We consider two cases: the first corresponds 
to complex Gaussian sources, i.e., s{t) - CN{0,cr^); and 
the second corresponds to sources that are distributed as 
complex Laplacian sources, i.e., K(s(t)) and $J(s(i)) are 
independent random variables having pdf —e~~. The second 
case corresponds to impulsive sources usually found in bio- 
medical application. 

We first consider the case in which w = 0; i.e., the noise 
is spatially white. Figure depicts the probability of correct 
decision in this case of both the GMDL estimator and the 
RMDL estimator when used with the estimates computed by 
the iterative algorithm. Since no deviations from the spatial 
white noise model exist in this case, the GMDL estimator is 
both consistent and robust, and indeed the empirical prob- 
ability of error of the GMDL estimator converges to zero 
whether the sources are Gaussian or Laplacian. The RMDL 
estimator is also both a consistent and a robust estimator, and 
again the empirical probability of error of the RMDL estimator 
converges to zero as well, independent of the source distri- 
bution. These empirical results demonstrate that the GMDL 
estimator is superior to the RMDL estimator in this situation, 
an additional 100 samples are required by the RMDL estimator 
in order to achieve the same probability of correct decision as 
the GMDL estimator In [27] it was proven that by exploiting 
more prior information the performance of the MDL estimator 
improves. This explains the superiority of the GMDL estimator 
over the RMDL estimator, since the GMDL estimator makes 
use of the spatial whiteness of the additive noise process, while 
the RMDL estimator ignores this information. 

In practice, multi-channel receivers are used in DOA esti- 
mation systems. The noise level in each receiver is different 
and hence the system has to be calibrated. Due to finite 
integration time, errors and different drifts in each channel, 
small differences in the noise levels at the different receiver 
channels exist. In the next example this scenario is simu- 
lated. For simulating this scenario w is taken to be w = 
9/10, —7/10, . . . , 9/30]. This w represents a scenario in 
which the noise level in each receiver is different from the 
nominal noise level by no more than —10 dB. Figure|2]depicts 
the probability of correct decision of both the GMDL and the 
RMDL estimators as functions of the number of snapshots 
taken for both Gaussian and Laplacian sources. 

The multiplicity of the received signal correlation matrix's 
smallest eigenvalue is equal to one, and hence the GMDL 
estimator is not consistent, that is P (q 7^ 3) ^ 1 [3]. From 
Fig. Hit is seen that the empirical probability of error of the 
GMDL estimator converges to one as the number of snap- 
shots increases. Nevertheless, it can be seen that this happen 
only when the number of snapshots is quite large (about 
10,000). This phenomenon can be explained by examining 
the eigenvalues of the received signal's correlation matrix. 
The eigenvalues of the received signal's correlation matrix 
are given by [20.1, 10.9, 1.93, 1.07, • • • , 0.92]. For the GMDL 
estimator, the simulated scenario corresponds to a scenario 



where p — 1 sources exists, the noise level equals to 0.9, and 
the SNR of the fourth strongest source at the array output is 
— 7dB. The GMDL requires about 10,000 snapshots in order 
for the probability of detection of this weak "virtual" source 
to be noticeable. As the number of snapshots increases, the 
probability of detection of this weak virtual source increases 
as well, causing the probability of correct decision to decrease 
to zero. On the other hand, it can be seen that the probability 
of error of the RMDL estimator converges to zero as the 
number of snapshots increases for both the Gaussian and the 
Laplacian sources. This demonstrate both the consistency and 
the robustness of the RMDL estimator. 

In Figure |3] we study the spatial separation between the 
sources required for reliable detection. We assume that the 
three sources' directions of arrival are [0,p,2p], 15,000 snap- 
shots are taken by the receiver, and the SNR per element is 
either dB or 5 dB. Figure |3]depicts the probability of correct 
decision of both the GMDL and the RMDL estimators for both 
Gaussian and Laplacian sources as a function of p. 

In the figure we can see again that the RMDL estimator 
outperforms the GMDL estimator. Even if large separation 
between the sources exists, the probability of correct decision 
of the GMDL estimator does not approach one. The prob- 
ability of correct decision of the RMDL estimator, on the 
other hand, approaches one with the increase in the separation 
between the sources. This difference can be explained with 
the aid of the received signal correlation matrix's eigenvalue 
spectrum. The received signal correlation matrix eigenvalues 
equal [11.54, 11.05, 10.39, 1.07, 0.9237]. The three high- 
est eigenvalues correspond to the three sources. However, 
due to the different noise level in each sensor, the rest of 
the eigenvalues are not equal to the noise level. The large 
number of snapshots enables the GMDL estimator to detect 
the differences in the weakest eigenvalues as vaUd sources, 
which results in an error event. However, if the number of 
snapshots is reduced, the GMDL estimator will not detect 
these differences. Nonetheless, if the number of snapshot is 
reduced, and a valid weak source exists, the GMDL estimator 
will not detect this valid source. 

As discussed in the beginning of this paper, in biological 
applications the noise level may vary considerably between 
the different receiver channels. Thus, large deviations from the 
ideal model are expected in such systems. For simulating this 
type of scenario we take w = ^[—9/10, —7/10, . . .,9/10], 
which represents deviations of up to —3 dB from the nominal 
noise level. Figure |3 depicts the probabilities of correct deci- 
sion of the GMDL and the RMDL estimators as functions of 
the number of snapshots taken. 

It can be seen that in this scenario the empirical er- 
ror probability of the GMDL estimator approaches one 
even when the number of snapshots is small (about 750). 
Again, this can be explained by examining the received 
signal correlation matrix's eigenvalues, which are equal to 
[20.13, 10. 93, 2, 1.36,..., 0.62]. The GMDL estimator inter- 
prets this scenario as a p — 1 sources scenario with the noise 
level equal to 0.5, and the SNR of the fourth strongest source 
at the array output is 6 dB. Due to its high SNR, only a small 
number of snapshots are required for detecting this "virtual" 
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source, and by detecting this virtual source an error event is 
created. As the number of snapshots increases, the probabiHty 
of detection of this virtual source increases as well, causing the 
probability of correct decision to decrease to zero. Again, it 
can be seen that the probability of error of the RMDL estimator 
converges to zero as the number of snapshots increases. 

In the last figure. Figure |5] we study the spatial sep- 
aration between the sources required for reliable detection 
when the deviation from the equal noise power assumption 
is large. We assume that three sources' directions of arrival 
are [0, p, 2p], 250 snapshots are taken by the receiver, and the 
SNR per element is either dB or 5 dB. Figure |5] depicts the 
probabilities of correct decision of both the GMDL and the 
RMDL estimators for the Gaussian and Laplacian sources as 
a function of p. Again, we can see that the RMDL estimator 
outperforms the GMDL estimator. Even for large separation 
between the sources, the deviation from the equal noise level 
assumption results in a change in the eigenvalue structure. This 
change is detected by the GMDL estimator as an additional 
sources, and hence an error event occurs. 

V. Summary and Concluding Remarks 

In this paper the problem of robust estimation of the 
number of sources impinging on an array of sensors has 
been addressed. It has been demonstrated that by proper use 
of additional unknown parameters, the resulting estimator, 
denoted as the RMDL estimator, is robust against both spatial 
and statistical mismodeling. This situation represents an im- 
provement on the traditional MDL estimator which is robust 
only against statistical mismodeling. In addition, a novel low- 
complexity algorithm for computing the estimates of the un- 
known parameters has been presented. It has been shown that 
this algorithm converges to the LS estimates of the unknown 
parameters. On one hand, the computational complexity of 
the proposed estimator is higher than the complexity of the 
traditional MDL estimator; on the other hand the complexity 
is far less than the complexity of known robust estimators 
which require several multi-dimensional searches. 

The proposed estimation algorithm can be used to robustify 
other estimation algorithms as well. Take for example the MU- 
SIC algorithm for estimating DOAs [28]. It is well known that 
the MUSIC algorithm is not robust against spatial mismodel- 
ing. Even slight spatial mismodeling can cause a large error in 
the estimated signal subspace, leading to substantial estimation 
errors. The use of our estimation technique to improve the 
robustness of the MUSIC algorithm is an interesting topic for 
further study. 
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Appendix I 
Proof of Lemma[1] 

In this appendix Lemma [J is proven by a way of induction 
on the number of sources. We first note that since X]i=i ^j^f^ 
is an Hermitian matrix, then if X]i=i "^i^f contains as its 
jth row it also contains ej as its jth column. 

We first assume g = 0; that is, the noise-only scenario. Since 
the noise-only scenario is always identifiable, the lemma holds 
for this case. 

Now assume that the lemma holds for q sources, that is for 
every identifiable point Oq £ &q, X]i=i '^i'^? does not have 
as one of its rows for every j = 1, . . . ,p. 
The following two lemmas will be essential in what follows. 

Lemma 4: Assume that, 0q e 0, is an identifiable point, 
and denote by R(0g) = AiVivf^. Denote by > • • • > 

Iq+i > = • • • = and {c;} are the eigenvalues and 
their corresponding eigenvectors of the matrix R(0g) + 6^6^^. 
Assume that rank(R(0q) -|- ejef) = g + 1, then, J^l^i ^i^f 
has Gj as his jth row. 

Proof of Lemma 4: Assume with out loss of generality that 



1. Since {ci} is an ortho-normal basis, X]i=i 



H 



X]i=i Cicf^ + X]i'=g+2 ^i^'t ~ I- According to the lemma we 
have to prove that Yl!i=i Cjcf^ has the following form. 



9+1 

i=l 



1 

0^ 





M 



(16) 



This will happen if and only if YTi=q+2 '^i^f following 
form 



i=q+2 





0^ 





M' 



(17) 



H 



where M+M' = I, (recall that Y.ltl c^cf +X;L?+2 ■ 
I). It is easy to verify that 'YTi=q+2 ^i^f ^^^^ have the form 
given by if and only if [ci]i = for every / > g -f- 1, so 
proving the lemma is equivalent to proving that [c;]i — for 
every / > q + 1. Assume that / > q + 1. From the properties 
of eigen-decomposition it follows that 



(R(0,)+eief)Q 



9+1 

E 

1=1 



kcicfci 




A,(vfQ)v, +ei[Q]i (18) 



, Vp are, respectively, the 



the subspace spanned by {vi}|^j^, then vfci — for every 
i < q. Thus, by using ( I18> . 



ei[c;]i = 0, 



(19) 



which is possible if and only if [c(]i ~ 0. 

Lemma 5: rank(R(0q) + e^ef^) ^ q + 1. 

Proof of Lemma 5: Without loss of generality (wig) it is 
proven that rank(R(0^) + eie^) = q + 1, Assume that 
rank(R(0g) + eief ) q. Thus the rank of both R(0g) 
and R(0g) + eie^ are equal. Hence, it is possible to find 
k constants, denoted by ai,. . . ,aq, not all of them equal to 
zero, such that 



ei 



i=l 



From i20\ it is easy to see that 

R.{9q) + eief = VAV^ 



(20) 



(21) 



where V = [vi,...,Vg], and A is some qx q diagonal 
matrix. Since 9q is an identifiable point, according the in- 
duction assumption there exists / > q such that [v;]i 
(otherwise according to the previous lemma the point would 
have been unidentifiable contredicting our assumption that 9q 
is identifiable). As such. 



(r(0,) + eief) V, = VAV^v, = - ei[vi]i 



(22) 



where Ai > • • • > Ap and vi, 

eigenvalues and eigenvectors of R(0^). Since {ci}^^^ spans sumption of q sources, log/x(X|0g) > log/x(X|0g), where 



which is possible if and only if [v;]i = 0, This is a 
contradiction, and Lemma |5] follows. 

Define g{9q) to be a function taking as an argument an 
identifiable point in &q, and returning a subset of @q+i, 
such that for every 9q+i £ di^q)^ Rx(Sg+i) ~ RxC^q), 
and for every Oq+i e .g(6»fe), Rx(0g+i) ^ Rxl^q)- It 
is easy to see from Lemma |5] that G if and 

only if, R(0,+i) = R(0,) + [w(0,)],e,ef , = 
^liOq) + ^ E,^Jw(0fe)],, and M9q+,% = M9q)]u - 
Tlij^i [^{^q)]j where k ^ i, and 1 < « < p. From Lemma 
|5]it is easy to see that (R(0g+i)) has rank q + 1, and from 
Lemma@]it is easy to see that the conditions stated in Lemma 
[2 are necessary. Since every unidentifiable point belongs to 
some g{9q) then the lemma is proved. 

Appendix II 
Proof of Lemma|2] 

In this appendix the consistency of the RMDL estimator is 
proved. Specifically it is shown that the probability of error 
of the RMDL estimator converges to zero as the number of 
snapshots increases to infinity. An error event will occur if and 
only if there exists k^q such that RMDL(g) - RMDL(fc) > 
0. Thus in order to prove the lemma it suffice to prove that 
for every k ^ q, P (RMDL(g) - RMDL(fc) > 0) ^ 0. 

Assume that k > q. Since the problem is a nested hypothesis 
problem, log/x(X|0fc) < log /x(X|^p_i) [11]. Also, since 
6q maximizes the likelihood of the measurements under the as- 
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6q is the true parameter value. Thus RMDL(g) - RMDL(A:) 
can be bounded as follows, 

RMDL((7) - RMDL(fc) = 
-log/x(X|^,) + log/x(X|4) + 

{q{2p-q)-k{2p-k))^ 

<-log/x(X|0,) + log/x(X|0p_i) 

log 

+ {q{2p-q)-k{2p-k))^. 

Using the spectral representation theorem, the received sig- 
nal's correlation matrix is equal to Rx(0q) = X^f^i K^i^f , 



where Ai > • • • > Ap and vi, 



, Vq are, respectively, the 



p-i 



such 



eigenvalues and their corresponding eigenvectors of Rx(0q). 
Thus there exists a point, denoted by 6*_i G © 
that Rx(0g) - Rx(0;-i) (take A = [vi,...,v, 
diag (Ai — A 



p-i- 



, Rs 



, A2 - Ai),o-„ = Ai,w = 0). Since 



p-i 



is an inner point of 0p_i, one can use the theory of like- 
lihood [29] to show that asymptotically, — 21og/x(X|0g) + 
21og/x(X|0p_i) = 21og/x(X|0p_i) -21og/x(X|0;_i) 
is distributed as a chi-square random variable with degrees 
of freedoms equal to the number of unknown parameters, 
(p^ — 1). We next note that since q < k, {q{2p — q) — 
k{2p — k)) '°|^ —00 as N approaches to infinity. Thus, as 
the number of measurements increases, the probability that 
log/x(XJ6/g) + log/x(X|0p_i) exceeds \{q{2p - q) - 



k{2p ~ k)) 



log JV I 



is given by the tail of the chi-square dis- 



tribution, which approaches zero as N approaches to infinity. 
Thus, 

P (RMDL((7) - RMDL(/c) > 0) < 
Pr ( - log /x (X| 0, ) + log /x (X 1 1 ) 



log N 

+ {q{2p -q)- k{2p ~ k))^— > 



0,(23) 



which complete the first part of the consistency proof. 

Now, assume fc < g. It was previously shown that under 
very weak conditions the probability of miss of every MDL 
estimator converges to zero as ^ cxd [11]. In particular the 
probability of miss of the RMDL estimator, which satisfies 
the condition stated in [11] is the MDL estimator, converges 
to zero as N ~> 00. 

Appendix III 
Proof of Lemma|3] 

The proof of Lemma |3] is very similar to the proof of 
Lemma 121 and thus only the necessary modifications for the 
proof of lemma |2l are detailed. Again, in order to prove that 
the probability of error converges to zero we will prove that 
Pr {RMDL(g) - RMDL(fc) > 0} ^ 0. 

Assume k > q. It is easy to see from the proof 
of Lemma H that Pr (RMDL(g) - RMDL(A:) > 0) < 
Pr(- log /x(X|0,) + log /x(X|0p_i) + (<z(2p - q) - k{2p - 
^-j-jio^jv ^ Qy jj. known that asymptotically, given 
the conditions stated in the lemma, log /x(X|0p_i) — 
log /x(X|0*_]^) is distributed as a weighted sum of chi-square 
random variables having one degree of freedom [30]. Thus by 



implying the same reasoning used in the proof of Lemma |2 
it easily shown that P (RMDL(q) - RMDL(fc) > 0) -» 0. 

Assume k < q. Again, this case is a special case of a more 
general theorem presented in [1 1] and hence we omit a specific 
proof for this case. 



Appendix IV 
Convergence of the Proposed Estimation 
Algorithm 

Denote by 0^ = [wg,„, ct^ „, Ag,„Rs^g^„A^„] the estimate 
of 6q after the nth iteration, and by i?„ the error between R 
and Rx(^g), that is En = R-Rx(^^). In this appendix it is 
proven that Tr{£'„£',f } > Tr {£'„+ii;^+J. The following 
lemma will be very helpful in the sequel. 

Lemma 6: Let X be a p x p Hermitian matrix, with 
eigenvalue representation X = X^iLi ca^i'^f ■ The closest 
(in the Frobenius norm sense) p x p Hermitian matrix X, 

such that X — J2i=i ^i^i^f + ' Sf^q+i ^iC^ is the matrix 

V I ■ 



=9+1 p-q ^^^i 



x = ELi-<v,vf + EL,+i 

Proof of Lemma 6: For the sake of simplicity we prove the 
lemma for real vectors, and not complex ones. The extension 
to complex vector is straight forward and thus is omitted 
here. We first note that we have to find the matrix X such 
that Tr H X - X 11 X - X 1 I is minimized. We note the 



following identities. 



XX^ = XI E aittj v,vf VjvJ = X a- v,vf 

1— 1 j — 1 i—1 

Tr{XX^}=f]a? 

i=i 

q+1 q+1 q+1 p 

i—1 j — 1 i—1 j—q+1 

p q+1 p p 

i=q+l j = l i=q+l j=q+l 

Tr{xX^}^X:E'^'.K-.)'+E E ^^Kcfc,] 

i^i j^i i^i j^q-\-i 



i=q+l j = l i=q+l j=q+l 

^ p q P P 

XX^ = E E a»'jv,vf CjcJ + X E o■^lv^vfcJcJ 

i=l j=l i—1 j — q-\-l 

Tr{xX^}^±±a,l,{.Jc,f + ± ± a.^ (vf c 

i=l j = l i=l j=q+l 

By using these identities, Tr j (x - x) (x - x) I can be 
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expressed as follows: 

A 



i? = Tr I (^X - Xj [XXj 
= Tr{XX^} -2Tr|xX^| +Tr|xX'^| = 

9+1 



i—1 i=l j=l 

q+1 q+1 



i=i j=i 

P 9+1 

- E E^'. 

i=g+l j=l 



E w 

i=l j=q+l 

P P 

+ E E ^ 

»=9+l j=9+l 



=1 j=q+l 



At the second part of the (n + l)th iteration, Wg^„+i is 
constructed as follows, 

Wq,„+i = diag - - o-^Vivf + allj 

= diag «+i + w,.„) . (26) 
The total error, between R and the estimate is 

Er,+i = R - diag (wq,„) - ^{k - a2)vivf 

i=l 

- diag (w,,„+i) = - diag (K+i) ■ (27) 



2 /'„T, 



• (24)Hgj^^g Tr {£;„+! J = E.,, K+i]^. [^n+i]^ = 

The derivatives of i? with respect to the unknown parameters Ti {E'^_^_iE!^i} < Tr {EnE^}, which concludes the 
are given by the following: proof. 
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Fig. 1 . Three-user scenario, no mismatch. ProbabiUty of correct decision as 
a function of the number of snapshots. 
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dck 



= ~4 E {^f^k) Cj + 4 ^ kl (cf Cfc) Ci 



i=l 



GMDL Gauss source 
RMDL Gauss source 
GMDL Laplace source 
RMDL Laplace source 



+4/2 (c^cfe) Cfe + 4 i^I^k) Ci 

i=q+l,i^k 

It is now easy to verify that by substituting into the above 
equations the proposed solution and exploiting the fact 
that {vi} is an orthonormal bases, all the derivatives are 
equal to zero, and hence the proposed solution minimizes 

TV{(X-X) (X-X)"}. 

According to the algorithm, at the beginning of the (n+l)th 
iteration the following matrix is created, ^i^i{h—cr^)'Vivf + 
ol^ = ELi ^^v.vf + al EL,+i v.vf , where Zi > ■ • • > 
Ip and Vi , . . . , Vp are, respectively, the eigenvalues and the 
corresponding eigenvectors of the matrix 

■f^ ~ diag (Wg^„) = En + Ag^„Rs^g^„A^^ + u^^^^ (25) pjg 2. Three-user scenario, weak mismatch. ProbabiUty of correct decision 

as a function of the number of snapshots. 

and cr^ = Er=q+i ^i- Denote by £^4+1 the error between 
R-diag (wq,„) and E?=i(^i-o-«)vjvf +cr2l; that is E'^^-y = 
R - diag (w,_„) - Y^i^iik - cr1)wiwf - all. According to 
the Lemma 6 TV{K+iE;^J < 'IV{^„^^}. 




FISHLER AND POOR: ROBUST ESTIMATION OF THE NUMBER OF SOURCES 




Fig. 5. Three-user scenario, strong mismatch. Probability of correct decision 
as a function of the spatial separation between the sources 



